Abstract: The MapReduce is an open source Hadoop framework implemented for processing and producing distributed large Terabyte data on large clusters. Its primary duty is to minimize the completion time of large sets of MapReduce jobs. Hadoop Cluster only has predefined fixed slot configuration for cluster lifetime. This fixed slot configuration may produce long completion time (Makespan) and low system resource utilization. Our proposed scheme is to allocate resources dynamically to MapReduce tasks. It can be done by following slot ratio configuration between map and reduce tasks, by updating the workload information of recently completed tasks. Many scheduling methodologies are discussed that aim to improve completion time goal.
Keywords: MapReduce, Makespan, Workload, Dynamic Slot Allocation.